Montgomery County
VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought Gabriel Sarch 1 Lawrence Jang 1 Michael J. Tarr 1 William W. Cohen 1,2
Large-scale generative language and vision-language models (LLMs and VLMs) excel in few-shot in-context learning for decision making and instruction following. However, they require high-quality exemplar demonstrations to be included in their context window. In this work, we ask: Can LLMs and VLMs generate their own examples from generic, sub-optimal demonstrations? We propose In-Context Abstraction Learning (ICAL), a method that builds a memory of multimodal experience from sub-optimal demonstrations and human feedback. Given a task demonstration that may contain inefficiencies or mistakes, a VLM abstracts the trajectory into a generalized program by correcting inefficient actions and annotating cognitive abstractions: causal relationships, object state changes, temporal subgoals, and task-relevant visual elements. These abstractions are iteratively improved and adapted through human feedback while the agent attempts to execute the trajectory in a similar environment. The resulting examples, when used as exemplars in the prompt, significantly improve decision-making in retrieval-augmented LLM and VLM agents. Moreover, as the agent's library of examples grows, it becomes more efficient, relying less on human feedback and requiring fewer environment interactions per demonstration.
The National Institute of Standards and Technology Braces for Mass Firings
Sweeping layoffs architected by the Trump administration and the so-called Department of Government Efficiency may be coming as soon as this week at the National Institute of Standards and Technology (NIST), a non-regulatory agency responsible for establishing benchmarks that ensure everything from beauty products to quantum computers are safe and reliable. According to several current and former employees at NIST, the agency has been bracing for cuts since President Donald Trump took office last month and ordered billionaire Elon Musk and DOGE to slash spending across the federal government. The fears were heightened last week when some NIST workers witnessed a handful of people they believed to be associated with DOGE inside Building 225, which houses the NIST Information Technology Laboratory at the agency's Gaithersburg, Maryland campus, according to multiple people briefed on the sightings. The DOGE staff were seeking access to NIST's IT systems, one of the people said. Soon after the purported visit, NIST leadership told employees that DOGE staffers were not currently on campus, but that office space and technology were being provisioned for them, according to the same people.
Suture Thread Modeling Using Control Barrier Functions for Autonomous Surgery
Forghani, Kimia, Raval, Suraj, Mair, Lamar, Krieger, Axel, Diaz-Mercado, Yancy
Automating surgical systems enhances precision and safety while reducing human involvement in high-risk environments. A major challenge in automating surgical procedures like suturing is accurately modeling the suture thread, a highly flexible and compliant component. Existing models either lack the accuracy needed for safety critical procedures or are too computationally intensive for real time execution. In this work, we introduce a novel approach for modeling suture thread dynamics using control barrier functions (CBFs), achieving both realism and computational efficiency. Thread like behavior, collision avoidance, stiffness, and damping are all modeled within a unified CBF and control Lyapunov function (CLF) framework. Our approach eliminates the need to calculate complex forces or solve differential equations, significantly reducing computational overhead while maintaining a realistic model suitable for both automation and virtual reality surgical training systems. The framework also allows visual cues to be provided based on the thread's interaction with the environment, enhancing user experience when performing suture or ligation tasks. The proposed model is tested on the MagnetoSuture system, a minimally invasive robotic surgical platform that uses magnetic fields to manipulate suture needles, offering a less invasive solution for surgical procedures.
Evaluating AI Evaluation: Perils and Prospects
As AI systems appear to exhibit ever-increasing capability and generality, assessing their true potential and safety becomes paramount. This paper contends that the prevalent evaluation methods for these systems are fundamentally inadequate, heightening the risks and potential hazards associated with AI. I argue that a reformation is required in the way we evaluate AI systems and that we should look towards cognitive sciences for inspiration in our approaches, which have a longstanding tradition of assessing general intelligence across diverse species. We will identify some of the difficulties that need to be overcome when applying cognitively-inspired approaches to general-purpose AI systems and also analyse the emerging area of "Evals". The paper concludes by identifying promising research pathways that could refine AI evaluation, advancing it towards a rigorous scientific domain that contributes to the development of safe AI systems.
Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations
Fleurence, Rachael, Bian, Jiang, Wang, Xiaoyan, Xu, Hua, Dawoud, Dalia, Fakhouri, Tala, Higashi, Mitch, Chhatwal, Jagpreet
This review introduces the transformative potential of generative Artificial Intelligence (AI) and foundation models, including large language models (LLMs), for health technology assessment (HTA). We explore their applications in four critical areas, evidence synthesis, evidence generation, clinical trials and economic modeling: (1) Evidence synthesis: Generative AI has the potential to assist in automating literature reviews and meta-analyses by proposing search terms, screening abstracts, and extracting data with notable accuracy; (2) Evidence generation: These models can potentially facilitate automating the process and analyze the increasingly available large collections of real-world data (RWD), including unstructured clinical notes and imaging, enhancing the speed and quality of real-world evidence (RWE) generation; (3) Clinical trials: Generative AI can be used to optimize trial design, improve patient matching, and manage trial data more efficiently; and (4) Economic modeling: Generative AI can also aid in the development of health economic models, from conceptualization to validation, thus streamlining the overall HTA process. Despite their promise, these technologies, while rapidly improving, are still nascent and continued careful evaluation in their applications to HTA is required. To ensure their responsible use and implementation, both developers and users of research incorporating these tools, should familiarize themselves with their current limitations, including the issues related to scientific validity, risk of bias, and consider equity and ethical implications. We also surveyed the current policy landscape and provide suggestions for HTA agencies on responsibly integrating generative AI into their workflows, emphasizing the importance of human oversight and the fast-evolving nature of these tools.
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights
Sarch, Gabriel, Jang, Lawrence, Tarr, Michael J., Cohen, William W., Marino, Kenneth, Fragkiadaki, Katerina
Large-scale generative language and vision-language models (LLMs and VLMs) excel in few-shot in-context learning for decision making and instruction following. However, they require high-quality exemplar demonstrations to be included in their context window. In this work, we ask: Can LLMs and VLMs generate their own prompt examples from generic, sub-optimal demonstrations? We propose In-Context Abstraction Learning (ICAL), a method that builds a memory of multimodal experience insights from sub-optimal demonstrations and human feedback. Given a noisy demonstration in a new domain, VLMs abstract the trajectory into a general program by fixing inefficient actions and annotating cognitive abstractions: task relationships, object state changes, temporal subgoals, and task construals. These abstractions are refined and adapted interactively through human feedback while the agent attempts to execute the trajectory in a similar environment. The resulting abstractions, when used as exemplars in the prompt, significantly improve decision-making in retrieval-augmented LLM and VLM agents. Our ICAL agent surpasses the state-of-the-art in dialogue-based instruction following in TEACh, multimodal web agents in VisualWebArena, and action anticipation in Ego4D. In TEACh, we achieve a 12.6% improvement in goal-condition success. In VisualWebArena, our task success rate improves over the SOTA from 14.3% to 22.7%. In Ego4D action forecasting, we improve over few-shot GPT-4V and remain competitive with supervised models. We show finetuning our retrieval-augmented in-context agent yields additional improvements. Our approach significantly reduces reliance on expert-crafted examples and consistently outperforms in-context learning from action plans that lack such insights.
Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024)
Rahmim, Arman, Bradshaw, Tyler J., Davidzon, Guido, Dutta, Joyita, Fakhri, Georges El, Ghesani, Munir, Karakatsanis, Nicolas A., Li, Quanzheng, Liu, Chi, Roncali, Emilie, Saboury, Babak, Yusufaly, Tahir, Jha, Abhinav K.
Arman Rahmim Departments of Radiology and Physics, University of British Columbia Tyler J. Bradshaw Department of Radiology, University of Wisconsin Guido Davidzon Department of Radiology, Division of Nuclear Medicine & Molecular Imaging, Stanford University Joyita Dutta Department of Biomedical Engineering, University of Massachusetts Amherst Georges El Fakhri PET Center, Departments of Radiology & Biomedical Engineering and Bioinformatics & Data Science, Yale University Munir Ghesani United Theranostics Nicolas A. Karakatsanis Department of Radiology, Weill Cornell Medical College of Cornell University, New York Quanzheng Li Center for Advanced Medical Computing and Analysis, Department of Radiology, Massachusetts General Hospital, Harvard Medical School Chi Liu Department of Radiology and Biomedical Imaging, Yale University Emilie Roncali Departments of Biomedical Engineering and Radiology, University of California, Davis Babak Saboury Department of Radiology and Imaging Sciences, Clinical Center, National Institutes of Health Tahir Yusufaly Russell H. Morgan Department of Radiology and Radiological Sciences, Johns Hopkins School of Medicine Abhinav K. Jha Department of Biomedical Engineering and Mallinckrodt Institute of Radiology, Washington University, St. Louis Abstract The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was "AI in Action". Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) emerging needs and tools for computational nuclear oncology, (iii) new frontiers in large language and generative models, (iv) defining the value proposition for the use of AI in nuclear medicine, (v) open science including efforts for data and model repositories, and (vi) issues of reimbursement and funding. The primary efforts, findings, challenges, and next steps are summarized in this manuscript. Introduction The Society of Nuclear Medicine & Molecular Imaging (SNMMI) 2nd Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Over 100 community members and stakeholders from academia, healthcare, industry, and NIH gathered to discuss the emerging role of AI in nuclear medicine. It featured two plenaries, panel discussions, talks from leading experts in the field, and was concluded by a round table discussion on key findings, next steps, and call to action.
Are they REALLY taking AI seriously? Biden's flagship artificial intelligence safety lab is found to be riddled with black mold, pests and a leaky roof
With only a modest 10 million budget to help regulate an industry of billionaires, Biden's new AI safety lab is now struggling with just the safety of its own facilities. 'Chronic underfunding' of the National Institute of Standards and Technology (NIST), the federal lab that will house the new US AI Safety Institute, has produced black mold, leaky ceilings, and a dead technician crushed by a concrete slab, reports say. Despite calls from scientists and entrepreneurs who have described'the risk of extinction from AI' as on par with'pandemics and nuclear war,' GOP deficit hawks in Congress pushed for a 10-percent budget cut to NIST -- and Biden approved. One former senior NIST official reported seeing'Home Depot dehumidifiers or portable AC units all over the place' bought by staff to help dry and slow the mold. Another reported indoor incessant leaks during rainy weather that required staff to'tarp up' critical electronic equipment.
Density-based Isometric Mapping
Yousefi, Bardia, Khansari, Mélina, Trask, Ryan, Tallon, Patrick, Carino, Carina, Afrasiyabi, Arman, Kundra, Vikas, Ma, Lan, Ren, Lei, Farahani, Keyvan, Hershman, Michelle
The isometric mapping method employs the shortest path algorithm to estimate the Euclidean distance between points on High dimensional (HD) manifolds. This may not be sufficient for weakly uniformed HD data as it could lead to overestimating distances between far neighboring points, resulting in inconsistencies between the intrinsic (local) and extrinsic (global) distances during the projection. To address this issue, we modify the shortest path algorithm by adding a novel constraint inspired by the Parzen-Rosenblatt (PR) window, which helps to maintain the uniformity of the constructed shortest-path graph in Isomap. Multiple imaging datasets overall of 72,236 cases, 70,000 MINST data, 1596 from multiple Chest-XRay pneumonia datasets, and three NSCLC CT/PET datasets with a total of 640 lung cancer patients, were used to benchmark and validate PR-Isomap. 431 imaging biomarkers were extracted from each modality. Our results indicate that PR-Isomap projects HD attributes into a lower-dimensional (LD) space while preserving information, visualized by the MNIST dataset indicating the maintaining local and global distances. PR-Isomap achieved the highest comparative accuracies of 80.9% (STD:5.8) for pneumonia and 78.5% (STD:4.4), 88.4% (STD:1.4), and 61.4% (STD:11.4) for three NSCLC datasets, with a confidence interval of 95% for outcome prediction. Similarly, the multivariate Cox model showed higher overall survival, measured with c-statistics and log-likelihood test, of PR-Isomap compared to other dimensionality reduction methods. Kaplan Meier survival curve also signifies the notable ability of PR-Isomap to distinguish between high-risk and low-risk patients using multimodal imaging biomarkers preserving HD imaging characteristics for precision medicine.
Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control
Zamzmi, Ghada, Venkatesh, Kesavan, Nelson, Brandon, Prathapan, Smriti, Yi, Paul H., Sahiner, Berkman, Delfino, Jana G.
Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection. Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure. Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications.